You are a junior data analyst working on the marketing analyst team at Bellabeat, a high-tech manufacturer of health-focused products for women. Bellabeat is a successful small company, but they have the potential to become a larger player in the global smart device market. Urška Sršen, cofounder and Chief Creative Officer of Bellabeat, believes that analyzing smart device fitness data could help unlock new growth opportunities for the company. You have been asked to focus on one of Bellabeat’s products and analyze smart device data to gain insight into how consumers are using their smart devices. The insights you discover will then help guide marketing strategy for the company. You will present your analysis to the Bellabeat executive team along with your high-level recommendations for Bellabeat’s marketing strategy.
Analyse the dataset to derive some meaningful insights and recommend some measures by which marketing team can imporve their marketing strategy.
What are some trends in smart device usage?
How could these trends apply to Bellabeat customers?
How could these trends help influence Bellabeat marketing strategy?
Urška Sršen: Bellabeat’s cofounder and Chief Creative Officer
Sando Mur: Mathematician and Bellabeat’s cofounder; key member of the Bellabeat executive team
Bellabeat marketing analytics team: A team of data analysts responsible for collecting, analyzing, and reporting data that helps guide Bellabeat’s marketing strategy.
Sršen encourages you to use public data that explores smart device users’ daily habits. She points you to a specific data set:
FitBit Fitness Tracker Data (CC0: Public Domain, dataset made available through Mobius): This Kaggle data set contains personal fitness tracker from thirty fitbit users. Thirty eligible Fitbit users consented to the submission of personal tracker data, including minute-level output for physical activity, heart rate, and sleep monitoring. It includes information about daily activity, steps, and sleep that can be used to explore users’ habits.
Lack of transparency in terms of data collection
Outdated data (the survey was in 2016) might be the wrong choice for developing a new product in 2021
Sample size: Only 33 users and short observation period of one month (Möbius dataset) is available as compared to two months (original dataset)
SELECT Id, ActivityDate,
COUNT(*) AS NumberOfRows
FROM bellabeat.dailyactivity
GROUP BY ID, ActivityDate
HAVING NumberOfRows > 1;
OUTPUT : Empty table because there are no duplicates
SELECT *,
COUNT(*) AS NumberOfRows
FROM bellabeat.Sleeplog
GROUP BY Id, SleepDay, TotalSleepRecords, TotalTimeInBed, TotalMinutesAsleep
HAVING NumberOfRows > 1;
| Id | SleepDay | TotalSleepRecords | TotalMinutesAsleep | TotalTimeInBed | NumberOfRows |
|---|---|---|---|---|---|
| 4388161847 | 2016-05-05 | 1 | 471 | 495 | 2 |
| 4702921684 | 2016-05-07 | 1 | 520 | 543 | 2 |
| 8378563200 | 2016-04-25 | 1 | 388 | 402 | 2 |
CREATE TABLE SleepLog2
SELECT DISTINCT * FROM SleepLog
SELECT Id, Date,
COUNT(*) AS NumberOfRows
FROM bellabeat.weightlog
GROUP BY Id, Date
HAVING NumberOfRows > 1;
OUTPUT : Empty table because there are no duplicates
#Checking if there any wrong Id in Daily Activity
SELECT Id
FROM bellabeat.dailyactivity
WHERE LENGTH(Id) > 10
OR LENGTH(Id) < 10
#Checking if there any wrong Id in Sleep Log
SELECT Id
FROM bellabeat.sleeplog2
WHERE LENGTH(Id) > 10
OR LENGTH(Id) < 10
#Checking if there any wrong Id in Weight Log
SELECT Id
FROM bellabeat.weightlog
WHERE LENGTH(Id) > 10
OR LENGTH(Id) < 10;
OUTPUT : All three queries returned empty table because there are no wrong ID’s.
SELECT 'TOTAL',
ROUND(SUM(TotalSteps),2) AS Steps,
ROUND(SUM(TotalDistance),2) AS Distance,
ROUND(SUM(Calories),2) AS Calories
FROM bellabeat.dailyactivity
UNION
SELECT 'AVERAGE',
ROUND(AVG(TotalSteps),2) AS Steps,
ROUND(AVG(TotalDistance),2) AS Distance,
ROUND(AVG(Calories),2) AS Calories
FROM bellabeat.dailyactivity
UNION
SELECT 'MIN',
MIN(TotalSteps),
MIN(TotalDistance),
MIN(Calories)
FROM bellabeat.dailyactivity
UNION
SELECT 'MAX',
ROUND(MAX(TotalSteps),2),
ROUND(MAX(TotalDistance),2),
ROUND(MAX(Calories),2)
FROM bellabeat.dailyactivity;
| METRIC | Steps | Distance | Calories |
|---|---|---|---|
| TOTAL | 7179636 | 5160.32 | 2165393.00 |
| AVERAGE | 7637.91 | 5.49 | 2303.61 |
| MIN | 0 | 0 | 0.00 |
| MAX | 36019 | 28.03 | 4900.00 |
SELECT
DAYNAME(ActivityDate) AS DayOfWeek,
ROUND(AVG(TotalSteps),2) AS AvgSteps,
ROUND(AVG(TotalDistance),2) AS AvgDistance,
ROUND(AVG(Calories),2) AS AvgCalories,
ROUND(AVG(TotalMinutesAsleep)/60,2) AS Average_Sleep_Hours
FROM
bellabeat.dailyactivity AS Act
LEFT JOIN
bellabeat.sleeplog2 AS Sleep
ON
Act.ActivityDate = Sleep.SleepDay
GROUP BY DayOfWeek
ORDER BY
CASE
WHEN DayOfWeek = 'Monday' THEN 1
WHEN DayOfWeek = 'Tuesday' THEN 2
WHEN DayOfWeek = 'Wednesday' THEN 3
WHEN DayOfWeek = 'Thursday' THEN 4
WHEN DayOfWeek = 'Friday' THEN 5
WHEN DayOfWeek = 'Saturday' THEN 6
WHEN DayOfWeek = 'Sunday' THEN 7
END ASC;
| DayOfWeek | AvgSteps | AvgDistance | AvgCalories | Average_Sleep_Hours |
|---|---|---|---|---|
| Monday | 7750.91 | 5.53 | 2322.53 | 6.98 |
| Tuesday | 8115.02 | 5.83 | 2355.89 | 6.75 |
| Wednesday | 7593.5 | 5.51 | 2308.71 | 7.24 |
| Thursday | 7635.67 | 5.48 | 2266.94 | 6.67 |
| Friday | 7449.29 | 5.3 | 2330.70 | 6.76 |
| Saturday | 8176.29 | 5.87 | 2357.84 | 6.99 |
| Sunday | 6943.2 | 5.04 | 2262.51 | 7.56 |
#Categorising Sleeper type
CREATE TABLE sleeper_type AS(
SELECT
Id,
ROUND(AVG(TotalMinutesAsleep)/60,2) AS Daily_Sleep,
CASE
WHEN ROUND(AVG(TotalMinutesAsleep)/60,2) < 4
THEN 'Unhealthy Sleeper'
WHEN ROUND(AVG(TotalMinutesAsleep)/60,2) >= 4 AND ROUND(AVG(TotalMinutesAsleep)/60,2)< 6
THEN 'Bad Sleeper'
WHEN ROUND(AVG(TotalMinutesAsleep)/60,2) >= 6 AND ROUND(AVG(TotalMinutesAsleep)/60,2)< 7
THEN 'Normal Sleeper'
ELSE 'Good Sleeper'
END AS Sleeper_type
FROM bellabeat.sleeplog2
GROUP BY id);
#Determining their contribustion in our dataset
SELECT
Sleeper_type,
COUNT(Sleeper_type)/(SELECT COUNT(*) FROM sleeper_type)*100 AS Percent
FROM bellabeat.sleeper_type
GROUP BY Sleeper_type
ORDER BY Sleeper_type;
| Sleeper_type | Percent |
|---|---|
| Bad Sleeper | 20.8333 |
| Good Sleeper | 45.8333 |
| Normal Sleeper | 20.8333 |
| Unhealthy Sleeper | 12.5000 |
#Categorising Weight Status
CREATE TABLE weight_status AS(
SELECT
Id,
ROUND(AVG(WeightKg),2) AS AverageWeight,
ROUND(AVG(BMI),2) AS AverageBMI,
CASE
WHEN ROUND(AVG(BMI),2) < 18.5
THEN 'Underweight'
WHEN ROUND(AVG(BMI),2) >= 18.5 AND ROUND(AVG(BMI),2) < 24.9
THEN 'Healthy Weight'
WHEN ROUND(AVG(BMI),2) >= 25.0 AND ROUND(AVG(BMI),2) < 29.9
THEN 'Overweight'
ELSE 'Obese'
END AS Weight_Status
FROM bellabeat.weightlog
GROUP BY id);
#Determining their contribustion in our dataset
SELECT
Weight_Status,
COUNT(Weight_Status)/(SELECT COUNT(*) FROM weight_status)*100 AS Percent
FROM bellabeat.weight_status
GROUP BY Weight_Status
ORDER BY Weight_Status;
| Weight_Status | Percent |
|---|---|
| Healthy Weight | 37.5000 |
| Obese | 12.5000 |
| Overweight | 50.0000 |
#Categorising User type
CREATE TABLE ProductWorn AS (
SELECT
Id,
VeryActiveMinutes+FairlyActiveMinutes+LightlyActiveMinutes+SedentaryMinutes AS TotalMiniutesWorn,
ROUND((VeryActiveMinutes+FairlyActiveMinutes+LightlyActiveMinutes+SedentaryMinutes)/1440*100,2) AS Percent,
CASE
WHEN ROUND((VeryActiveMinutes+FairlyActiveMinutes+LightlyActiveMinutes+SedentaryMinutes)/1440*100,2) >= 50 AND
ROUND((VeryActiveMinutes+FairlyActiveMinutes+LightlyActiveMinutes+SedentaryMinutes)/1440*100,2)< 70
THEN 'light User'
WHEN ROUND((VeryActiveMinutes+FairlyActiveMinutes+LightlyActiveMinutes+SedentaryMinutes)/1440*100,2) >= 70 AND
ROUND((VeryActiveMinutes+FairlyActiveMinutes+LightlyActiveMinutes+SedentaryMinutes)/1440*100,2)< 90
THEN 'Moderate User'
ELSE 'High User'
END AS User_type
FROM bellabeat.dailyactivity
GROUP BY Id);
#Determining their contribustion in our dataset
SELECT
User_type,
COUNT(User_type)/(SELECT COUNT(*) FROM productworn)*100 AS Percent
FROM bellabeat.productworn
GROUP BY User_type
ORDER BY User_type;
| User_type | Percent |
|---|---|
| High User | 60.6061 |
| light User | 27.2727 |
| Moderate User | 12.1212 |
SELECT
Id,
ROUND(AVG(TotalTimeInBed-TotalMinutesAsleep),2) AS TimeDiffMins,
ROUND(AVG(TotalTimeInBed-TotalMinutesAsleep)/TotalTimeInBed*100,2) AS Percent
FROM bellabeat.sleeplog2
GROUP BY Id
ORDER BY Percent DESC
Here are the Top 10 users who take maximum time to sleep
| Id | TimeDiffMins | Percent |
|---|---|---|
| 1644430081 | 52.00 | 40.94 |
| 3977333714 | 167.50 | 35.71 |
| 1844505072 | 309.00 | 32.15 |
| 2320127002 | 8.00 | 11.59 |
| 8378563200 | 40.81 | 11.46 |
| 5553957443 | 42.39 | 9.13 |
| 4558609924 | 12.40 | 9.05 |
| 2347167796 | 44.53 | 8.38 |
| 6117666160 | 31.39 | 7.88 |
| 6775888955 | 19.33 | 7.43 |
There were some limitations in the data set like there was 24 unique ID in the Sleep data and only 8 IDs in the Weight data set but I tried to derive some insights from it.
The highest average step count and calorie intake are on Tuesday and Saturday. They hit their lowest on Friday and Sunday.
The average step count is below 10000 steps which means people are not able to achieve recommended steps of 10000 each day.
Only 45% of the people are good sleepers who are having around 7-8 hours of sleep. 20% of the people are bad sleepers with 4-6 hours of sleep. Just like bad sleepers, 20% of the people are normal sleepers with average sleep of 6-7 hours. 12.5% of people are unhealthy sleepers with less than 4 hours of sleep daily.
50% of people are overweight with a BMI between 25-29.9. 37.5% of people are in the category of healthy weight with a BMI between 18.5-24.5. 12.5% of people are suffering from obesity with BMI greater than 29.9.
60% of the users wear their wearables almost all day. 27% of people are lights users who wear their wearables around 50-70% part of their day. 12% are moderate users who wear their wearables around 70-90% part of their day.
Sleep data shows 24 Ids out of 33 Ids which means people might don’t wear their wearables at night due to reasons like comfort, notifications etc.
Some users take around 30-40% time of their total time in bed to sleep which means they keep using their phone or do some kind of work after going to the bed.
We can gamify our UI such as receiving points for meeting or exceeding daily step goals, sleeping hours, and so on. Once enough points have been acquired, then the user can exchange them for a discount coupon to get some off on a future wearable purchase.
As we have seen that some users don’t wear their wearables at night and this issue can be solved if we market our product in such a way that "It has a comfortable strap and lightweight main unit".
We can collaborate with some popular apps like HealthifyMe and My Fitness Pal to track their customer data and use it for our purpose because the dataset that we have used in this study is old and not complete. We can run our marketing campaigns on their apps too.
We can do promotional activities in corporate offices and universities like we can create awareness programs for the staff and students that “How a healthy lifestyle can lead to a better and long life” and How Bellabeat wearables can help you achieve that lifestyle.